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Abstract 

Current work in planning with preferences assume that the user's preference models are com- 
pletely specified and aim to search for a single solution plan. In many real-world planning sce- 
narios, however, the user probably cannot provide any information about her desired plans, or in 
some cases can only express partial preferences. In such situations, the planner has to present 
not only one but a set of plans to the user, with the hope that some of them are similar to the plan 
she prefers. We first propose the usage of different measures to capture quality of plan sets that 
are suitable for such scenarios: domain-independent distance measures defined based on plan 
elements (actions, states, causal links) if no knowledge of the user's preferences is given, and 
the Integrated Convex Preference measure in case the user's partial preference is provided. We 
then investigate various heuristic approaches to find set of plans according to these measures, 
and present empirical results demonstrating the promise of our approach^ 
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1. Introduction 

Most work in automated planning takes as input a complete specification of domain mod- 
els and/or user preferences and the planner searches for a single solution satisfying the goals, 
probably optimizing some objective function. In many real world planning scenarios, however, 
the user's preference s on desired plans are either unknown or at best partially specified (c.f. 



Kambh ampati In such cases the planner's job changes from finding a single optimal 



plan to finding a set of representative solutions ("options") and presenting them to the user with 
the hope that she can find one of them desirable. As an example, in adaptive web services com- 
position, the causal dependencies among some web services might change at the execution time, 
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and as a result the web service engine wants to have a set of diverse plans/compositions such that 
if there is a failure while executin g one composition, an alternative may be used which is less 
likely to be failing simultaneously (IChafle et all 120061) . However, if a user is helping in selecting 



the compositions, the planner could be first asked for a set of plans that may take into account the 
user's trust in some particular sources and when she selects one of them, it is next asked to find 
plans that are similar to the select ed one. The require ment of searching for a set of plans is also 
considered in intrusion detection (Bodd v et al. , 2005) where a security analysis needs to analyze 



a set of attack plans that might be attempted by a potential adversary, given limited (or unknown) 
information about the adversary's model (e.g., his goals, capabilities, habits, ...), and the result- 
ing analyzed information can then be used to set u p defensive strategies against potential attacks 



in the future. Another example can be found in Me mon et alJ (120011) in which test cases for 
graphical user interfaces (GUIs) are generated as a set of distinct plans, each corresponding to a 
sequence of actions that a user could perform, given the user's unknown preferences on how to 
interact with the GUI to achieve her goals. The capability of synthesizing multiple plans would 



also have potential application in case-based planning (e.g., |Serinaj (2010)) where it is important 
to have a plan set satisfying a case instance. These plans can be different in terms of criteria such 
as resources, makespan and cost that can only be specified in the retrieval phase. In the problem 
of travel planning for individuals of a city in a distributed manner while also optimizing public 
resource (e.g., road, traffic police personel), the availability of a number of plans for each per- 
son's goals could make the plan merging phase easier and reduce the conflicts among individual 
plans. 

In this work, we investigate the problem of generating a set of plans in order to deal with 
planning situations where the preference model is not completely specified. In particular, we 
consider the following scenarios: 

• Even though the planner is aware that the user has some preferences on solution plans, it 
is not provided with any of that knowledge. 

• The planner is provided with incomplete knowledge of the user's preferences. In particular, 
the user is interested in some plan attributes (such as the duration and cost of a flight, or 
whether all packages with priority are delivered on time in a logistic domain), each with 
different but unknown degree of importance (represented by weight or trade-off values). 
Normally, it is quite hard for a user to indicate the exact trade-off values, but instead more 
likely to determine that one attribute is more (or less) important than some others — for 
instance, a bussinessman would consider the duration of a flight much more important 
than its cost. Such kind of incomplete preference specification could be modeled with a 
probability distribution of weights values^, and is therefore assumed to be given as an input 
(together with the attributes) to the planner. 

Even though, in principle, the user would have a better chance to find her desired plan from a 
larger plan set, there are two problems to consider — one computational, and other comprehen- 
sional. The computational problem is that synthesis of a single plan is often quite costly already, 
and therefore it is even more challenging to search for a large plan set. Coming to the second 
problem, it is unclear that the user will be able to inspect a large set of plans to identify the plan 



2 Even if we do not have any special knowledge about this probability distribution, we can always start by initializing 
it to be uniform, and gradually improve it based on interaction with the user. 
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she prefers. What is clearly needed, therefore, is the ability to generate a set of plans, among 
all sets of bounded (small) number of plans, with the highest chance of including the user's pre- 
ferred plan. An immediate challenge is formalizing what it means for a meaningful set of plans, 
in other words what the quality measure of plan sets should be given an incomplete preference 
specification. 

We propose different quality measures for the two scenarios listed above. In the extreme 
case when the user could not provide any knowledge of her preferences, we define a spectrum 
of distance measures between two plans based on their syntactic features in order to define the 
diversity measure of plan sets. These measures can be used regardless of the user's preference, 
and by maximizing the diversity of a plan set we increase the chance that the set is uniformly 
distributed in the unknown preference space, and therefore likely contains a plan that is close to 
a user's desired one. 

This measure can be further refined when some knowledge of the user's preferences is pro- 
vided. As mentioned above, we assume that the user's preference is specified by a convex com- 
bination of plan attributes, and incomplete in the sense that the distribution of trade-off weights 
is given, not their exact values. The whole set of best plans (i.e. the ones with the best value 
function) can be pictured as the lower convex-hull of the Pareto set on the attribute space. To 
measure the quality of any (bounded) se t of plans on the w hole optimal set, we adapt the idea 
of Integrated Preference Function (IPF) (ICarlvle et al. . 120031) . in particular its special case Inte- 
grated Convex Preference (ICP). This measure was developed in the Operations Research (OR) 
community in the context of multi-criteria scheduling, and is able to assoc iate a robust measure 
of representativeness for any set of solution schedules dFowler et al. . 2005 ). 

Armed with these quality measures, we can then formulate the problem of planning with 
partial preference models as finding a bounded set of plans that has the best quality value. Our 
next contribution therefore is to investigate effective approaches for using quality measures to 
bias a planner's search to find a high quality plan set efficiently. For the first scenario when the 
preference specification is not provi ded, two representative state- of-the-art planning approaches 
are considered. The first, GP-CSP (IDo and Kambhamnati. typifies the issues involved 

in generating d i verse plans in bounded horizon compilation approaches, while the second, LPG 



(IGerevini et all 120031) . typifies the issues involved in modifying the heuristic search planners. 



Our investigations with GP-CSP allow us to compare the relative difficulties of enforcing diversity 
with each of the three different distance measures (elaborated in later section). With LPG, we 
find that the proposed quality measure makes it more effective in generating plan set over large 
problem instances. For the second case when part of the user's preferences is provided, we 
also present a spectrum of approac hes for solving this p roblem efficiently. We implement these 



approaches on top of Metric-LPG (Ger evini et aU 12008b . Our empirical evaluation compares 



these approaches both among themselves as well as against the methods for generating diverse 
plans ignoring the partial preference information, and the results demonstrate the promise of our 
proposed solutions. 

Our work can be considered as a complement to current research in planning with prefer- 
ences, as shown in Figure Q] Under the perspective of planning with preferences, most current 
work in planning synthesize a single solution plan, or a single best one, in situations where user 
has no preferences, or a complete knowledge of preferences is given to the planner. On the 
other hand, we address the problem of synthesizing a set of plans when knowledge of user's 
preferences is either completely unknown or partially specified. 

The paper is organized as follows. Section|2]gives fundamental concepts in preferences, and 
formal notations. In Section [3] we formalize quality measures of plan set in the two scenarios. 
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Figure 1: An overview picture of planning with respect to knowledge of user's preferences. 



Sections |4] and [5] discuss our various heuristic approaches to generate plan sets, together with 
the experimental results. We discuss related work in Section [6] future work and conclusion in 
Section Q 



2. Background and Notation 

Given a planning problem with the set of solution plans S, a user preference model is a transitive, 
refiextive relation in S x S, which defines an ordering between two plans p and p' in S. Intuitively, 
P di p' means that the user prefers p at least as much as p'. Note that this ordering can be 
either partial (i.e. it is possible that neither p < p 1 nor p 1 < p holds — in other words, they are 
incomparable), or total (i.e. either p < p' orp' ■< p holds). A plan p is considered more preferred 
than a plan p 1 , denoted by p -< p', if p ^ p', p' ^ P, and they are equally preferred if p ^ p' 
and p' ^ p. A plan p is an optimal (i.e., most preferred) plan if p ^ p' for any other plan p'. A 
plan set V C S is considered more preferred than V 1 C S, denoted by V -t, V' , if p -< p' for any 
peP and p' 6 V', and they are incomparable if there exists p £ V and p' € T 5 ' such that p and 
p' are incomparable. 

The ordering < implies a partition of S into disjoint plan sets (or classes) So, S%, ... (So U 
S\ U ... = S, Si P\ Sj =0) such that plans in the same set are equally preferred, and for any set 
Si, Sj, either 5, -< Sj , Sj -< Si, or they a re incomparable . The partial ordering between these 
sets can be represented as a Hasse diagram dBirkhofd l 1948) where the sets are vetices, and there 



is an (upward) edge from Sj to Si if Si -< Sj and there is not any Sk in the partition such that 
Si -< Sk ~< Sj. We denote l(Si) as the "layer" of the set Si in the diagram, assuming that the 
most preferred sets are placed at the layer 0, and l(Sj) = l(Si) + 1 if there is an edge from Sj 
to Si. A plan in a set at a layer with the smaller value, in general, is either more preferred than 
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Figure 2: The Hasse diagrams and layers of plan sets implied by two preference models. In (a), 5i -< 52 -< 1S3, and any 
two plans are comparable. In (b), on the other hand, Si -< S2 -< 54, 5i -< 53, and each plan in 53 is incomparable 
with plans in 52 and 54. 



or incomparable with ones at high-value layers^ Figure [2] show examples of Hasse diagrams 
representing a total and partial preference ordering between plans. 

When the preference model is explicitly specified, answering queries such as comparing two 
plans, finding a most preferred (optimal) plan becomes an easy task. This is possible, however, 
only if the set of plans is small and known upfront. Many preference languages, therefore, have 
been proposed to represent the relation -< in a more compact way, and serve as starting points for 
algorithms to answer queries. Most preference languages fall into the following two categories: 

• Quantitative languages define a value function V : S — > R which assigns a real number 
to each plan, with a precise interpretation that p r< p' <S=> V(p) < V(p'). Although 
this function is defined differently in many languages, at a high level it combines the 
user's preferences on various aspects of plan that can be measured quantita tively. For 
instance, in the context of decision-theoretic planning ( Boutilier et all [l999h . the value 
function of a policy is defined as the expected rewards of states that are vis ited when the 
policy executes. In partia l satisfaction (over-subcription) planning (PSP) ( Smith! 2004 : 



Van Den Briel et all 2004), the quality of plans is d efined as its total rew ards of soft goals 
achieved minus its total action costs. In PDDL2.1 dFox and Lon g. 2003), the value func- 



tion is an arith metic function of num erical fluents such as plan makespans, fuel used etc., 
and in PDDL3 (Gerevini et al. . 2009) it is enhanced with individual preference sp ecifica- 
tion d efined as formulae over state trajectory using linear temporal logic (LTL) (Pnueli, 



1977). 



Qualitative languages provide qualitative statements that are more i ntuitive for lay users t o 



specify. A commonly used language of this type is CP-networks dBoutilier et all [2004) 



where the user can specify her preference statements on values of plan attributes, possibly 
given specification of others (for instance, 'Among ti ckets with the same p rices, I prefer 
airline A to airline B."). Another example is LPP (iBienvenu et all 120061) in which the 



3 If -< is a total ordering, then plans at smaller layer is more preferred than ones at higher layer. 
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Figure 3: The metamodel I Brafm an and DomshlakLf2009l) . 

statements can be specified using LTL formulae, and possibly being aggregated in different 
ways. 

Figure [3] shows the conceptual relation of preference models , languages and algorithms. We 
refer the reader to the work by Brafman and Do mshla k (2009) for a more detailed discussion 
on this metamodel, and by Baier and Mcllraith (2009) for an overview of different preference 
languages used in planning with preferences. 

From the modeling point of view, in order to design a suitable language capturing the user's 
preference model, the modeler should be provided with some knowledge of the user's interest 
that affects the way she evaluates plans (for instance, flight duration and ticket cost in a travel 
planning scenario). Such knowledge in many cases, however, cannot be completely specified. 
Our purpose therefore is to present a bounded set of plans to the user in the hope that it will 
increase the chance that she can find a desired plan. In the next section, we formalize the quality 
measures for plan sets in two situations where either no knowledge of the user's preferences or 
only part of them is given. 

3. Quality Measures for Plan Sets 

3.1. Syntactic Distance Measures for Unknown Preference Cases 

We first consider the situation in which the user has some preferences for solution plans, but the 
planner is not provided with any knowledge of such preferences. It is therefore impossible for the 
planner to assume any particular form of preference language representing the hidden preference 
model. There are two issues that need to be considered in formalizing a quality measure for plan 
sets: 

• What are the elements of plans that can be involved in a quality measure? 

• How should a quality measure be defined using those elements? 

For the first question, we observe that even though users are normally interested in some 
high level features of plans that are relevant to them, many of those features can be considered as 
"functions" of base level elements of plans. For instance, the set of actions in the plan determine 
the makespan of a (sequential) plan, and the sequence of states when the plan executes gives the 
total reward of goals achieved. We consider the following three types of base level features of 
plans which could be used in defining quality measure, independently of the domain semantics: 

• Actions that are present in plans, which define various high level features of the plans such 
as its makespan, execution cost etc. that are of interest to the user whose preference model 
could be represented with preference languages such as in PSP and PDDL2.1. 
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Basis 


Pros 


Cons 


Actions 


Does not require 
problem information 


No problem information 
is used 


States 


Not dependent on any specific 
plan representation 


Needs an execution 
simulator to identify states 


Causal links 


Considers causal proximity 
of state transitions (action) 
rather than positional 
(physical) proximity 


Requires domain theory 



Table 1 : The pros and cons of different base level elements of plan. 



• Sequence of states that the agent goes through, which captures the behaviors resulting 
from the execution of plans. In many preference languages defined using high level fea- 
tures of plans such as the reward of goals collected (e.g., PSP), of the whole state (e.g., 
MDP) , or the temporal relation be tween propositions occur in states (e.g. PDDL3, VP 
dSon and PontelliL Eooi) and LPP (Fritz and Mcll raithi kood) ). the sequence of states can 



affect the quality of plan evaluated by the user. 

• The causal links representing how actions contribute to the goals being achieved, which 
measures the causal structures of plans^ These plan elements can affect the quality of 
plans with respect to the languages mentioned above, as the causal links capture both the 
actions appearing in a plan and the temporal relation between actions and variables. 

A similar conceptual separatio n of fe atures has also been considered recently in the context 
of case-based planning by Serina J2010h . in which planning problems were assumed to be well 



classified, in terms of costs to adapt plans of one problem to solve another, in some unknown 
high level feature space. The similarity between problems in the space were implicitly defined 
using kernel functions of their domain-independent graph representations. In our situation, we 
aim to approximate quality of plan sets on the space of features that the user is interested in using 
distance between plans with respect to base level features of plans mentioned above (see below). 

Table Q] gives the pros and cons of using the different base level elements of plan. We note 
that if actions in the plans are used in defining quality measure of plan sets, no additional problem 
or domain theory information is needed. If plan behaviors are used as base level elements, the 
representation of the plans that bring about state transition becomes irrelevant since only the 
actual states that an execution of the plan will take is considered. Hence, we can now compare 
plans of different representations, e.g., four plans where the first is a deterministic plan, the 
second is a contingent plan, the third is a hierarchical plan and the fourth is a policy encoding 
probabilistic behavior. If causal links are used, then the causal proximity among actions is now 
considered rather than just physical proximity in the plan. 

Given those base level elements, the next question is how to define a quality measure of plan 
sets using them. Recall that without any knowledge about the user's preferences, there is no way 
for the planner to assume any particular preference language, because of which the motivation 



A causal link a± A ai records that a predicate is produced by a\ and consumed by a^. 
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behind a choice of quality measure should come from the hidden user's preference model. Given 
a Hasse diagram induced from the user's preference model, a fc-plan set that will be presented to 
the user can be considered to be randomly selected from the diagram. The probability of having 
one plan in the set classified in a class at the optimal layer would increase when the individual 
plans are more likely to be at different layers, and this chance in turn will increase if they are less 
likely to be equally prefered by the user|f| On the other hand, the effect of base level elements of 
a plan on high level features relevant to the user suggests that plans similar with respect to base 
level features are more likely to be close to each other on the high level feature space determining 
user's preference model. 

In order to define a quality measure using base level features of plans, we proceed with the 
following assumption: plans that are different from each other with respect to the base level 
features are less likely to be equally prefered by the user, in other words they are more likely to 
be at different layers of the Hasse diagram. With the purpose of increasing the chance of having 
a plan that the user prefers, we propose the quality measure of plan sets as its diversity measure, 
defined using the distance between two plans in the set with respect to a base level element. More 
formally, the quality measure C, : 2 s — >• R of a plan set V can be defined as either the minimal, 
maximal, or average distance between plans: 

• Minimal distance: 

Cmm CP) = min 5{p,p') (1) 

p,p'EV 

• Maximal distance: 

(maxiV) = max d(p,p') (2) 

• Average distance: 

Ca V9 (P) = V J) x £ 5(p,p>) (3) 

where 5 : S x S —> [0, 1] is the distance measures between two plans. 
3.1.1. Distance measures between plans 

There are various choices on how to define the distance measure 5(p, p') between two plans using 
plan actions, sequence of states or causal links, and each way can have different impact on the 
diversity of plan set on the Hasse diagram. In the following, we propose distance measures in 
which a plan is considered as (i) a set of actions and causal links, or (ii) sequence of states the 
agent goes through, which could be used independently of plan representation (e.g. total order, 
partial order plans). 



5 To see this, consider a diagram with <Si = {pi,P2} at layer 0, 52 = {P3} and 53 = {pi} at layer 1, and 
^4 = {P5} at layer 2. Assuming that we randomly select a set of 2 plans. If those plans are known to be at the same 
layer, then the chance of having one plan at layer is A . However, if they are forced to be at different layers, then the 
probability will be § . 
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Plan as a set of actions or causal links: given a plan p, let A(j>) and C'(p) be the set of 
actions or causal links of p. The distance between two plans p and p' can be defined as the 
ratio of the number of actions (causal links) that do not appear in both plans to the total 
number of actions (causal links) appearing in one of them: 

\c(p)nc(p')\ 

ScLip > p) = 1 - \c(p)uc( P >)\ (5) 

Plan as a sequence of states: given two sequence of states (so, Si, Sfe) and (s , s' 1; s' fc/ ) 
resulting from executing two plans p and p', and assume that k' < k. Since the two se- 
quence of states may have different length, there are various options in defining distance 
measure between p and p 1 , and we consider here two simple options. In the first one, it 
can be defined as the average of the distances between state pairs (s,, (0 < i < k'), 
and each state Sk'+x,... Sfc is considered to contribute maximally (i.e. one unit) into the 
difference between two plans: 



5g(p, f f) = -x[ y £ l ^{s i ,8^ + k-k'] (6) 
»=i 

On the other hand, we can assume that the agent continues to stay at the goal state s' k , in 
the next (k — k') time steps after executing p', and the measure can be defined as follows: 

k' k 
i=l i=fe'+l 

The distance measure A(s, s') between two states s, s' used in those two measures is 
defined as 

Example: Figure |4] shows three plans p\, p2 and p% for a planning problem where the initial 
state is {ri} and the goal propositions are {r^, r^}. The specification of actions are shown in 
the table. The action sets of the first two plans ({ai, 02, 03} and {ai, 02, 04}) are quite similar 
(8a(pi,P2) — 0.5), but the causal links which involve 03 (02 — > r% — 03, 03 — >• r4 — ac) and 
04 (a/ — > n — 04, 04 — > r4 — ag) make their difference more significant with respect to causal- 
link based distance (5c l (pi,P2) = !)■ Two other plans p\ and p$, on the other hand, are very 
different in terms of action sets (and therefore the sets of causal links): Sa(Pi,Ps) = 1, but they 
are closer in term of state-based distance (j| as defined in the equation|6] and 0.5 if defined in 
the equation |7}. 
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Figure 4: Example illustrating plans with base-level elements, aj and ag denote dummy actions producing the initial 
state and consuming the goal propositions, respectively (see text for more details). 



3.2. Integrated Preference Function (IPF)for Partial Preference Cases 

We now discuss a quality measure for plan sets in the case when the user's preference is partially 
expressed. In particular, we consider scenarios in which the preference model can be repre- 
sented by some quantitative language with an incompletely s pecified value function of high level 
features. As an example, the quality of plans in PDDL2.1 ( Fox and Lond, 2003 ) and PDDL3 



( Gerevini and Lon j, 2005 ) are represented by a metric function combining metric fluents and 



preference statements on state trajectory with parameters representing their relative importance. 
While providing a convenient way to represent preference models, such parameterized value 
functions present an issue of obtaining reasonable values for the relative importance of the fea- 
tures. A common approach to model this type of incomplete knowledge is to consider those 
parameters as a vector of random variables, whose values are assumed to be drawn from a distri- 
bution. This is the representation that we will follow. 

To measure the quality of plan sets, we propose the usage of Integrated Preference Function 
(IPF) dCarlvle et all 2003), which has been used to measure the quality of a solution set in a 



wide range of multi-objective optimization problems. The IPF measure assumes that the user's 
preference model can be represented by two factors: (1) a probability distribution h(a) of pa- 
rameter vector a such that J a h(a) da = 1 (in the absence of any special information about the 
distribution, h(a) can be assumed to be uniform), and (2) a value function V(p, a) : S — > K 
combines different objective functions into a single real-valued quality measure for plan p. This 
incomplete specification of the value function represents a set of candidate preference models, 
for each of which the user will select a different plan, the one with the best value, from a given 
plan set PC5. The IPF value of solution set V is defined as: 

IPF{V)= I h(a)V{p a ,a)da (9) 
10 



with p a — argminy(p, a) is the best solution according to V(j>, a) for each given a value. Let 

p~ x be its inverse function specifying a range of a values for which p is an optimal solution 
according to V(p, a). As p a is piecewise constant, the IPF(V) value can be computed as: 



Since V* is the set of plans that are optimal for some specific parameter vector, IPF(V) 
now can be interpreted as the expected value that the user can get by selecting the best plan in 
V. Therefore, the set V* of solutions (known as lower convex hull of V) with the minimal IPF 
value is most likely to contain the desired solutions that the user wants and in essense a good 
representative of the plan set V. 

While our work is applicable to more general planning scenarios, to make our discussion on 
generating plan sets concrete, we will concentrate on metric temporal planning where each action 
a G A has a duration d a and execution cost c a . The planner needs to find a plan p = {a± . . . a n }, 
which is a sequence of actions that is executable and achieves all goals. The two most common 
plan quality measures are: makespan, which is the total execution time of p; and plan cost, which 
is the total execution cost of all actions in p — both of them are high level features that can be 
affected by the actions in the plan. In most real-world applications, these two criteria compete 
with each other: shorter plans usually have higher cost and vice versa. We use the following 
assumptions: 

• The desired objective function involves minimizing both components: time(p) measures 
the makespan of the plan p and cost(p) measures its execution cost. 

• The quality of a plan p is a convex combination: V(p, w) — w x time(p) + (1 — w) x 
cost(p), where weight w E [0, 1] represents the trade-off between the two competing 
objective functions. 

• The belief distribution of w over the range [0, 1] is known. If the user does not provide any 
information or we have not learnt anything about the preference on the trade-off between 
time and cost of the plan, then the planner can assume a uniform distribution (and improve 
it later using techniques such as preference elicitation). 

Given that the exact value of w is unknown, our purpose is to find a bounded representative 
set of non-dominated plansf] minimizing the expected value of V(p, w) with regard to the given 
distribution of w over [0, 1]. 

Example: Figure [5] shows our running example in which there are a total of 7 plans with their 

time(p) and cost(p) values as follows: p\ = {4, 25}, P2 = {6, 22}, p^ — {7, 15}, p4 = {8, 20}, 



6 A plan pi is dominated by p2 if time(p\ ) > time(p2) and cost(p\) > cost(p2) and at least one of the inequali- 




(10) 



Let V* = {peV : p~ 1 ^ 0} then we have: 




(11) 



ties is strict. 
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Figure 5: Solid dots represents plans in the pareto set (pi , P2, P3, P5, P7)- Connected dots represent plans in the lower 
convex hull (pi,P3,P7) giving optimal ICP value for any distribution on trade-off between cost and time. 

P5 — {10, 12}, pe = {11, 14}, and pi — {12, 5}. Among these 7 plans, 5 of them belong to a 
pareto optimal set of non-dominated plans: V v — {pi,P2,P3,P5,P7}- The other two plans are 
dominated by some plans in V v : p$ is dominated by p% and p§ is dominated by p$. Plans in 
V p are depicted in solid dots, and the set of plans P* = {pi,P3,P7} that are optimal for some 
specific value of w is highlighted by connected dots. 

IPF for Metric Temporal Planning: The user preference model in our target domain of tempo- 
ral planning is represented by a convex combination of the time and cost quality measures, and 
the IPF measure now is called Integrated Convex Preference (ICP). Given a set of plans V*, let 
t p = time{p) and c p = cost(p) be the makespan and total execution cost of plan p G V* , the 
ICP value of V* with regard to the objective function V(p, w) = w x t p + (1 — w) x c p and the 
parameter vector a = (w, 1 — w) (w € [0, 1]) is defined as: 

ICP{V*) =J2 h(w)(w x t p% + (1 - w) x c Pi )dw (12) 
i=i 

where wo = 0, Wk = 1 and pi = argrninl/(p, w) Vw G [lUj-i, u>i]. In other words, we divide 

pGV* 

[0, 1] into non-overlapping regions such that in each region itfj) there is a single solution 

Pi G "P* that has better V(pi, w) value than all other solutions in V* . 

We select the IPF/ICP measure to evaluate our solution set due to its several nice properties: 

• If V x ,Vi C S and ICP (Pi) < ICP{V 2 ) then Pi is probabilistically better than P 2 
in the sense that for any given w, let pi = &igmmV(p,w) and p 2 = a,igmmV(p, w), 

peVi ' p£T 2 

then the probability of V(pi, w) < V(p2, w) is higher than the probability of V(pi, w) > 

V(p 2 ,w). 

• If V\ is obviously better than V2, then the ICP measure agrees with the assertion. More 
formally: if Vp2 £ P2, ^Pi G Pi such that P2 is dominated by pi, then ICP(Pi) < 
ICP{V 2 )- 
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Empirically, extensive results on scheduling problems in iFowler et al ] J2005h have shown 
that ICP measure "evaluates the solution quality of approximation robustly (i.e., similar to visual 
comparison results) while other alternative measures can misjudge the solution quality" . 

In the next two sections [4] and [5] we investigate the problem of generating high quality plan 
sets for two cases mentioned: when no knowledge about the user's preferences is given, and 
when part of its is given as input to the planner. 



4. Generating Diverse Plan Set in the Absence of Preference Knowledge 

In this section, we describe approaches to searching for a set of diverse plans with respect to a 
measure defined with base level elements of plans as discussed in the previous section. In partic- 
ular, we consider the quality measure of plan set as the minimal pair-wise distance between any 
two plans, and generate a set of plans containing k plans with the quality of at least a predefined 
threshold d. As discussed earlier, by diversifying the set of plans on the space of base level fea- 
tures, it is likely that plans in the set would cover a wide range of space of unknown high level 
features, increasing the possibility that the user can select a plan close to the one that she prefers. 
The problem is formally defined as follows: 

<aDISTANT£SET : Find V with V C S, \ V \ = k and Q{V) = min S(p, q) > d 

where any distance measure between two plans formalized in Section [3. 1.1 1 can be used to im- 
plement S(p,p'). 

We now consider two representative state-of-the-art planning appr oaches in generating di- 
verse plan sets. The first one is GP-CSP JDo and Kambhamrjati, 2001) repres enting constraint- 
based planning approaches, and the second one is LPG dGerevini et all 120031) that uses an effi- 
cient local-search based approach. We use GP-CSP to comparing the relation between different 
distance measures in diversifying plan sets. On the other hand, with LPG we stick to the action- 
based distance measure, which is shown experimentally to be the most difficult measure to en- 
force diversity (see below), and investigate the scalability of heuristic approaches in generating 
diverse plans. 



4.1. Finding Diverse Plan Set with GP-CSP 

The GP-CSP planner JDo and Kambhampatil 120011) converts Graphplan's planning graph into a 
CSP encoding, and solves it using a standard CSP solver. The solution of the encoding represents 
a valid plan for the original planning problem. In the encoding, the CSP variables correspond 
to the predicates that have to be achieved at different levels in the planning graph (different 
planning steps) and their possible values are the actions that can support the predicates. For 
each CSP variable representing a predicate p, there are two special values: i) _L: indicates that a 
predicate is not supported by any action and is false at a particular level/planning-step; ii) "noop": 
indicates that the predicate is true at a given level i because it was made true at some previous 
level j < i and no other action deletes p between j and i. Constraints encode the relations 
between predicates and actions: 1) mutual exclusion relations between predicates and actions; 
and 2) the causal relationships between actions and their preconditions. 
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4.1.1. Adapting GP-CSP to Different Distance Metrics 

When the above planning encoding is solved by any standard CSP solver, it will return a solution 
containing (var, value) of the form {(xi, yi), ...{x n , y n )}- The collection of Xi where iji ^ _L 
represents the facts that are made true at different time steps (plan trajectory) and can be used as a 
basis for the state-based distance measur^l the set of (yi ^ _L) A (yi ^ noop) represents the set 
of actions in the plan and can be used for action-based distance measure; lastly, the assignments 
(xi, yi) themselves represent the causal relations and can be used for the causal-based distance 
measure. 

However, there are some technical difficulties we need to overcome before a specific distance 
measure between plans can be computed. First, the same action can be represented by different 
values in the domains of different variables. Consider a simple example in which there are two 
facts p and q, both supported by two actions a\ and a^. When setting up the CSP encoding, we 
assume that the CSP variables x\ and X2 are used to represent p and q. The domains for x\ and 
X2 are {i>ii, ^12} and {i>2i, ^22}, both representing the two actions {a\, 02} (in that order). The 
assignments {(xi,vn), (^2, V21}} an ^ {( x i^ v i2), (^2,^22)} have a distance of 2 in traditional 
CSP because different values are assigned for each variable x\ and X2- However, they both 
represent the same action set {a\ , 02} and thus lead to the plan distance of if we use the action- 
based distance in our plan comparison. Therefore, we first need to translate the set of values in 
all assignments back to the set of action instances before doing comparison using action-based 
distance. The second complication arises for the causal-based distance. A causal link a\ — > 0,2 
between two actions ai and 02 indicates that ci\ supports the precondition p of 02. However, the 
CSP assignment (p, ai) only provides the first half of each causal-link. To complete the causal- 
link, we need to look at the values of other CSP assignments to identify action 02 that occurs at 
the later level in the planning graph and has p as its precondition. Note that there may be multiple 
"valid" sets of causal-links for a plan, and in the implementation we simply select causal-links 
based on the CSP assignments. 

4.1.2. Making GP-CSP Return a Set of Plans 

To make GP-CSP return a set of plans satisfying the c/DISTANTfcSET constraint using one of 
the three distance measures, we add "global" constraints to each original encoding to enforce 
of-diversity between every pair of solutions. When each global constraint is called upon by the 
normal forward checking and arc-consistency checking procedures inside the default solver to 
check if the distance between two plans is over a predefined value d, we first map each set of 
assignments to an actual set of actions (action-based), predicates that are true at different plan- 
steps (state-based) or causal-links (causal-based) using the method discussed in the previous 
section. This process is done by mapping all (var, value) CSP assignments into action sets 
using a call to the planning graph, which is outside of the CSP solver, but works closely with the 
general purpose CSP solver in GP-CSP. The comparison is then done within the implementation 
of the global constraint to decide if two solutions are diverse enough. 
We investigate two different ways to use the global constraints: 

1. The parallel strategy to return the set of k plans all at once. In this approach, we create 
one encoding that contains k identical copies of each original planning encoding created 
using GP-CSP planner. The k copies are connected together using k(k — l)/2 pair-wise 



7 We implement the state-based distance between plans as defined in equation|6] 
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global constraints. Each global constraint between the i and j copies ensures that two 
plans represented by the solutions of those two copies will be at least d distant from each 
other. If each copy has n variables, then this constraint involves 2n variables. 
2. The greedy strategy to return plans one after another. In this approach, the k copies are not 
setup in parallel up-front, but sequentially. We add to the i th copy one global constraint to 
enforce that the solution of the i th copy should be <i-diverse from any of the earlier i — 1 
solutions. The advantage of the greedy approach is that each CSP encoding is significantly 
smaller in terms of the number of variables (n vs. k * n), smaller in terms of the number 
of global constraints (1 vs. k(k — l)/2), and each global constraint also contains lesser 
number of variables (n vs. 2 * n)H Thus, each encoding in the greedy approach is easier 
to solve. However, because each solution depends on all previously found solutions, the 
encoding can be unsolvable if the previously found solutions comprise a bad initial solution 
set. 



4.1.3. Empirical Evaluation 

We implemented the parallel and greedy approaches discussed earlier for the three distance mea- 
sures an d tested them with the ben chmark set of Logistics problems provided with the Blackbox 



planner dKautz a nd SelmanL 119981) . All experiments were run on a Linux Pentium 4, 3Ghz ma- 
chine with 512MB RAM. For each problem^, we test with different d values ranging from 0.01 
(1%) to 0.95 (95%E and k increases from 2 to n where n is the maximum value for which 
GP-CSP can still find solutions within plan horizon. The horizon (parallel plan steps) limit is 30. 

We found that the greedy approach outperformed the parallel approach and solved signifi- 
cantly higher number of problems. Therefore, we focus on the greedy approach hereafter. For 
each combination of d, k, and a given distance measure, we record the solving time and output 
the average/min/max pairwise distances of the solution sets. 

Baseline Comparison: As a baseline comparison, we have also implemented a randomized ap- 
proach. In this approach, we do not use global constraints but use random value ordering in the 
CSP solver to generate k different solutions without enforcing them to be pairwise d-distance 
apart. For each distance d, we continue running the random algorithm until we find k max solu- 
tions where k max is the maximum value of k that we can solve for the greedy approach for that 
particular d value. In general, we want to compare with our approach of using global constraint 
to see if the random approach can effectively generate diverse set of solutions by looking at: (1) 
the average time to find a solution in the solution set; and (2) the maximum/average pairwise 
distances between k > 2 randomly generated solutions. 

Table [2] shows the comparison of average solving time to find one solution in the greedy 
and random approaches. The results show that on an average, the random approach takes sig- 
nificantly more time to find a single solution, regardless of the distance measure used by the 
greedy approach. To assess the diversity in the solution sets, Table [3] shows the comparison of: 
(1) the average pairwise minimum distance between the solutions in sets returned by the random 
approach; and (2) the maximum d for which the greedy approach still can find a set of diverse 
plans. The comparisons are done for all three distance measures. For example, the first cell 
(0.041/0.35) in Table [3] implies that the minimum pairwise distance averaged for all solvable 



8 However, each constraint is more complicated because it encodes previously found solutions. 
'log-easy=probl, rocket-a=prob2, log-a = prob3, log-b = prob4, log-c=prob5, log-d=prob6. 
increments of 0.01 from 0.01 to 0.1 and of 0.05 thereafter. 
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Probl 


Prob2 


Prob3 


Prob4 


Prob5 


Prob6 


Sa 


0.087 


7.648 


1.021 


6.144 


8.083 


178.633 


5s 


0.077 


9.354 


1.845 


6.312 


8.667 


232.475 


5 C 


0.190 


6.542 


1.063 


6.314 


8.437 


209.287 


Random 


0.327 


15.480 


8.982 


88.040 


379.182 


6105.510 



Table 2: Average solving time (in seconds) to find a plan using greedy (first 3 rows) and by random (last row) approaches 





Probl 


Prob2 


Prob3 


Prob4 


Prob5 


Prob6 


Sa 


0.041/0.35 


0.067/0.65 


0.067/0.25 


0.131/0.1* 


0.126/0.15 


0.128/0.2 


5s 


0.035/0.4 


0.05/0.8 


0.096/0.5 


0.147/0.4 


0.140/0.5 


0.101/0.5 


5 C 


0.158/0.8 


0.136/0.95 


0.256/0.55 


0.459/0.15* 


0.346/0.3* 


0.349/0.45 



Table 3: Comparison of the diversity in the plan sets returned by the random and greedy approaches. Cases where random 
approach is better than greedy approach are marked with (*). 

k > 2 using the random approach is d = 0.041 while it is 0.35 (i.e. 8x more diverse) for the 
greedy approach using the S a distance measure. Except for 3 cases, using global constraints to 
enforce minimum pairwise distance between solutions helps GP-CSP return significantly more 
diverse set of solutions. On average, the greedy approach returns 4.25x, 7.3 lx, and 2.79x more 
diverse solutions than the random approach for S a , 5 S and S c , respectively. 

Analysis of the different distance-bases: Overall, we were able to solve 1264 (d, k) combi- 
nations for three distance measures S a ,S s , S c using the greedy approach. We were particularly 
interested in investigating the following issues: 

• HI: Computational efficiency - Is it easy or difficult to find a set of diverse solutions 
using different distance measures? Thus, (1) for the same d and k values, which distance 
measure is more difficult (time consuming) to solve; and (2) given an encoding horizon 
limit, how high is the value of d and k for which we can still find a set of solutions for a 
given problem using different distance measures. 

• H2: Solution diversity - What, if any, is the correlation/sensitivity between different dis- 
tance measures? Thus, how comparative diversity of solutions is when using different 
distance measures. 

Regarding HI, Table |4]shows the highest solvable k value for each distance d and base 8 a , 
5 S , and 5 C . For a given (d, k) pair, enforcing 5 a appears to be the most difficult, then S s , and S c is 
the easiest. GP-CSP is able to solve 237, 462, and 565 combinations of (d, k) respectively for 5 a , 
5 S and S c . GP-CSP solves (iDISTANTfcSET problems more easily with S s and S c than with S a due 
to the fact that solutions with different action sets (diverse with regard to S a ) will likely cause 
different trajectories and causal structures (diverse with regard to 5 S and 6 C ). Between 5 S and S c , 
5 C solves more problems for easier instances (Problems 1-3) but less for the harder instances, as 
shown in Table |4] We conjecture that for solutions with more actions (i.e. in bigger problems) 
there are more causal dependencies between actions and thus it is harder to reorder actions to 
create a different causal-structure. 

For running time comparisons, among 216 combinations of (d, k) that were solved by all 
three distance measures, GP-CSP takes the least amount of time for S a in 84 combinations, for 5 S 
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d 


Probl 


Prob2 


Prob3 


Prob4 


Prob5 


Prob6 


0.01 


11,5, 28 


8,18,12 


9,8,18 


3,4,5 


4,6,8 


8,7,7 


0.03 


6,3,24 


8,13,9 


7,7,12 


2,4,3 


4,6,6 


4,7,6 


0.05 


5,3,18 


6,11,9 


5,7,10 


2,4,3 


4,6,5 


3,7,5 


0.07 


2,3,14 


6,10,8 


4,7,6 


2,4,2 


4,6,5 


3,7,5 


0.09 


2,3,14 


6,9,6 


3,6,6 


2,4,2 


3,6,4 


3,7,4 


0.1 


2,3,10 


6,9,6 


3,6,6 


2,4,2 


2,6,4 


3,7,4 


0.2 


2,3,5 


5,9,6 


2,6,6 


1,3,1 


1,5,2 


2,5,3 


0.3 


2,2,3 


4,7,5 


1,4,4 


1,2,1 


1,3,2 


1,3,3 


0.4 


1,2,3 


3,6,5 


1,3,3 


1,2,1 


1,2,1 


1,2,3 


0.5 


1,1,3 


2,4,5 


1,2,2 




1,2,1 


1,2,1 


0.6 


1,1,2 


2,3,4 










0.7 


1,1,2 


1,2,2 










0.8 


1,1,2 


1,2,2 










0.9 




1,1,2 











Table 4: For each given d value, each cell shows the largest solvable k for each of the three distance measures 5 a , 8 S , 
and 8 C (in this order). The maximum values in cells are in bold. 





Sa 


Ss 


Sc 


Sa 




1.262 


1.985 


Ss 


0.485 




0.883 


Sc 


0.461 


0.938 





Table 5 : Cross-validation of distance measures 8 a , S B , and 8 C . 

in 70 combinations and in 62 for 8 C . The first three lines of Table [2] show the average time to find 
one solution in d-diverse fc-set for each problem using 8 a , 8 S and 8 C (which we call t a , t s and t c 
respectively). In general, t a is the smallest and t s > t c in most problems. Thus, while it is harder 
to enforce 8 a than 8 S and 8 C (as indicated in Table @J, when the encodings for all three distances 
can be solved for a given (d, k), then 8 a takes less time to search for one plan in the diverse plan 
set; this can be due to tighter constraints (more pruning power for the global constraints) and 
simpler global constraint setting. 

To test H2, in Table [3] we show the cross-comparison between different distance measures 
8 a , 8 S , and 8 C . In this table, cell (row, column) — (8' , 8") indicates that over all combinations 
of (d, k) solved for distance 5', the average value d" jd' where d" and d' are distance measured 
according to 8" and 8', respectively (d' > d). For example, (i5 s ,<5 a ) = 0.485 means that over 
462 combinations of (d, k) solvable for 8 S , for each d, the average distance between k solutions 
measured by 8 a is 0.485 * d s . The results indicate that when we enforce d for 8 a , we will likely 
find even more diverse solution sets according to 8 S (1.26 * d a ) and 8 C (1.98 * d a ). However, 
when we enforce d for either 8 S or 8 C , we are not likely to find a more diverse set of solutions 
measured by the other two distance measures. Nevertheless, enforcing d using 8 C will likely give 
comparable diverse degree d for 8 S (0.94 * d c ) and vice versa. We also observe that d s is highly 
dependent on the difference between the parallel lengths of plans in the set. The distance d s 
seems to be the smallest (i.e. d s < d a < d c ) when all k plans have the same/similar number of 
time steps. This is consistent with the fact that 8 a and 8 C do not depend on the steps in the plan 
execution trajectory while 8 S does. 
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4.2. Finding Diverse Plan Set with LPG 

In this section, we consider the problem of generating diverse set of plans using another planning 
approach, in particular the LPG planner which is able to scale up to bigger problems, compared to 
GP-CSP. We focus on the action-based distance measure between plans, which has been shown 
in the previous section to be the most difficult to enforce diversity. LPG is a local-search-based 
planner, that increme ntally modifies a partial plan in a search for a plan that contains no flaws 



dGerevini et all 120031) . The behavior of LPG is controlled by an evaluation function that is used 
to select between different plan candidates in a neighborhood generated for local search. At 
each search step, the elements in the search neighborhood of the current partial plan 7r are the 
alternative possible plans repairing a selected flaw in tt. The elements of the neighborhood are 



evaluated according to an action evaluation function E dGerevini et al.U2003l) . This function is 



used to estimate the cost of either adding or of removing an action node a in the partial plan p 
being generated. 



4.2.1. Revised Evaluation Function for Diverse Plans 

In order to manage ctolSTANCE/cSET problems, the function E has been extended to include an 
additional evaluation term that has the purpose of penalizing the insertion and removal of actions 
that decrease the distance of the current partial plan p under adaptation from a reference plan 
Po- In general, E consists of four weighted terms, evaluating four aspects of the quality of the 
current plan that are affected by the addition (E(a) 1 ) or removal (E(a) r ) of a 

E(a,y = ole ■ Execution.cost(a)' + ar ■ Temporal jzost(ay+ 

+ as ■ Searchjzost{aY + ao • \(po ~ P) np R | 

E(a) r — ole ■ Execution.cost(aY + olt ■ TemporaLcost(a) r + 

+ as ■ Search-Cost(a) r + ao • (po — P — a) P1Pr|. 



The first three terms of the two forms of E are unchanged from the standard behavior of 
LPG. The fourth term, used only for computing diverse plans, is the new term estimating how the 
proposed plan modification will affect the distance from the reference p lan po. Each cost term in 
E is computed using a relaxed temporal plan pn dGerevini et all 12003). 

The pn plans are comp u ted by an algorithm, called RelaxedPlan, formally described and 



illustrated in iGerevini et all (12003b . We have slightly modified this algorithm to penalize the 
selection of actions decreasing the plan distance from the reference plan. The specific change to 
RelaxedPlan for computing diverse plans is very similar to the change described in dFox et all 



2006), and it concerns the heuristic function for selecting the actions for achieving the subgoals 



in the relaxed plans. In the modified function for RelaxedPlan, we have an extra 0/1 term that 
penalizes an action b for p R if its addition dec reases the distance of p + p R from po (in the plan 
repair context investigated in dFox et all [2006b . b is penalized if its addition increases such a 
distance). 

The last term of the modified evaluation function E is a measure of the decrease in plan 
distance caused by adding or removing a: \(po—p)rip R \ or \{po—p—a)C\p R \, where p R contains 
the new action a. The a-coefficients of the E'-terms are used to weigh their relative importance^ 



These coefficients are also normalized to a value in [0, 1] using the method described in Gerevini et al. 12003). 
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The values of the first 3 terms are automatic ally derived from the expression defining the plan 



metric for the problem dGerevini et all l2003). The coefficient for the fourth new term of E (ac) 
is automatically set during search to a value proportional to d/8 a (p,Po), where p is the current 
partial plan under construction. The general idea is to dynamically increase the value of old 
according to the number of plans n that have been generated so far: if n is much higher than k, 
the search process consists of finding many solutions with not enough diversification, and hence 
the importance of the last E'-term should increase. 

4.2.2. Making LPG Return a Set of Plans 

In order to compute a set of k e?-distant plans solving a efolSTANCEfcSET problem, we run the 
LPG search multiple times, until the problem is solved, with the following two additional changes 
to the standard version of LPG: (i) the preprocessing phase computing mutex relations and other 
reachability information exploited during the relaxed plan construction is done only once for all 
runs; (ii) we maintain an incremental set of valid plans, and we dynamically select one of them 
as the reference plan po for the next search. Concerning (ii), let V — {pi, ...,p n } be the set of n 
valid plans that have been computed so far, and CPlans(pi) the subset of V containing all plans 
that have a distance greater than or equal to d from a reference plan pi G P. 

The reference plan po used in the modified heuristic function E is a plan p max £ V which 
has a maximal set of diverse plans in V, i.e., 

p max = ARGMAX {pieV} {\CPlans(pA\} . (13) 

The plan p max is incrementally computed each time the local search finds a new solution. 
In addition to being used to identify the reference plan in E, p m ax is also used for defining the 
initial state (partial plan) of the search process. Specifically, we initialize the search using a 
(partial) plan obtained by randomly removing some actions from a (randomly selected) plan in 
the set CPlans{p max ) U {fwj. 

The process of generating diverse plans starting from a dynamically chosen reference plan 
continues until at least k plans that are all d-distant from each other have been produced. The 
modified version of LPG to compute diverse plans is called LPG-d. 

4.2.3. Experimental Analysis with LPG-d 

Recall that the distance function S a , using set-difference, can be written as the sum of two terms: 

5h ,_ \A{ Pl )~A{p 3 )\ \A( Pj ) - A( Pl )\ 

da(Pl ^>- \A( PI )UA( P3 )\ + \A( Pt )uA( Pj )\ (14) 

The first term represents the contribution of the actions in pi to the plan difference, while 
the second term indicates the contribution of pj to S a . We experimentally observed that in some 
cases the differences between two diverse plans computed using S a are mostly concentrated in 
only one of the S a components. This asymmetry means that one of the two plans can have many 
more actions than the other one, which could imply that the quality of one of the two plans is 
much worse than the quality of the other plan. In order to avoid this problem, we can parametrize 
S a by imposing the two extra constraints 

6£>dh and<5f >d/ 7 
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dDISTANCEkSET: Median of the cpu-time for the pfile20 problem of 
gamma=3 - DriverLog Time domain 




dDISTANCEkSET: Median of the Average distances for the pfile20 problem of 
gamma=3 - DriverLog Time domain 
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Figure 6: Performance of LPG-d (CPU-time and plan distance) for the problem pfile20 in the DrfverLog-Time domain. 

where 5^ and 8^ are the first and second terms of S a , respectively, and 7 is an integer parameter 
"balancing" the diversity of pi and pj. 

In this section, we analyze the performance of the modified version of LPG, called LPG-d, in 
three different benchmark domains from the 3rd and 5th IPCs. The main goals of the experimen- 
tal evaluation were (i) showing that LPG-d can efficiently solve a large set of (d, ^-combinations, 
(ii) investigating the impact of the S a 7-constraints on performance, (iii) comparing LPG-d and 
the standard LPG. 

We tested LPG-d using both the default and parametrized versions of S a , with 7 = 2 and 
7 = 3. We give detailed results for 7 = 3 and a more general evaluation for 7 = 2 and the 
original S a . We consider d that varies from 0.05 to 0.95, using 0.05 increment step, and with k 
= 2.. .5, 6, 8, 10, 12, 14, 16, 20, 24, 28, 32 (overall, a total of 266 (d, A:)-combinations). Since 
LPG-d is a stochastic planner, we use the median of the CPU times (in seconds) and the median 
of the average plan distances (over five runs). The average plan distance for a set of k plans 
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dDISTANCEkSET: Median of the cpu-time for the pfile20 problem of 
gamma=3 - Satellite Strips domain 
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dDISTANCEkSET: Median of the Average distances for the pfile20 problem of 
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Figure 7: Performance of LPG-d (CPU-time and plan distance) for the problem pfile20 in the Satellite-Strips domain. 

solving a specific (d, fc)-combination (5 av ) is the average of the plans distances between all pairs 
of plans in the set. The tests were performed on an AMD Athlon(tm) XP 2600+, 512 Mb RAM. 
The CPU-time limit was 300 seconds. 

Figure|6]gives the results for the largest problem in IPC-3 DriverLog-Time (fully-automated 
track). LPG-d solves 109 (d, fc)-combinations, including combinations d < 0.4 and k = 10, and 
d = 0.95 and k = 2. The average CPU time (top plots) is 162.8 seconds. The average S av 
(bottom plots) is 0.68, with S av always greater than 0.4. With the original 6 a function LPG-d 
solves 110 (d, ^-combinations, the average CPU time is 160 seconds, and the average 5 av is 
0.68; while with 7 = 2 LPG-d solves 100 combinations, the average CPU time is 169.5 seconds, 
and the average S av is 0.69. 

Figure [7] shows the results for the largest problem in IPC-3 Satellite-Strips. LPG-d solves 
211 (k, d) -combinations; 173 of them require less than 10 seconds. The average CPU time is 
12.1 seconds, and the average S av is 0.69. We observed similar results when using the original 



21 



dDISTANCEkSET: Median of the cpu-time for the pfile15 problem of 
gamma=3 - storage Propositional domain 

s*. cpu-time 




Figure 8: Performance of LPG-d (CPU-time and plan distance) for the problem pfilel5 in the Storage-Propositional 
domain. 



8 a function or the parametrized S a with 7 = 2 (in the second case, LPG-d solves 198 problems, 
while the average CPU time and the average S av are nearly the same as with 7 = 3). 

Figure [8] shows the results for a middle-size problem in IPC-5 Storage-Propositional. With 
7 = 2 LPG-d solves 225 (k, e?) -combinations, 39 of which require less than 10 seconds, while 
128 of them require less than 50 seconds. The average CPU time is 64.1 seconds and the average 
S av is 0.88. With the original S a LPG-d solves 240 (k, (i)-combinations, the average CPU time 
is 41.8 seconds, and the average 5 av is 0.87; with 7 = 3 LPG-d solves 206 combinations, the 
average CPU time is 69.4 seconds and the average 5 av is 0.89. 

The local search in LPG is randomized b y a "noise" parameter that is automatically set and 



updated during search (IGerevini et all 120031) . This randomization is one of the techniques used 



for escaping local minima, but it also can be useful for computing diverse plans: if we run the 
search multiple times, each search is likely to consider different portions of the search space, 
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which can lead to different solutions. It is then interesting to compare LPG-d and a method in 
which we simply run the standard LPG until k d-diverse plans are generated. An experimental 
comparison of the two approaches show that in many cases LPG-d performs better. In particular, 
the new evaluation function E is especially useful for planning problems that are easy to solve for 
the standard LPG, and that admit many solutions. In these cases, the original E function produces 
many valid plans with not enough diversification. This problem is significantly alleviated by the 
new term in E. An example of domain where we observed this behavior is logistics^! 

5. Generating Plan Sets with Partial Preference Knowledge 

In this section, we consider the problem of generating plan sets when the user's preferences 
are only partially expressed. In particular, we focus on metric temporal planning where the 
preference model is assumed to be represented by an incomplete value function specified by a 
convex combination of two features: plan makespan and execution cost, with the exact trade-off 
value w drawn from a given distribution. The quality value of plan sets is measured by the ICP 
value, as formalized in Equation[T2] Our objective is to find a set of plans PCS where \P\ < k 
and ICP(V) is the lowest. 

Notice that we restrict the size of the solution set returned, not only for the comprehension 
issue discussed earlier, but also for an important property of the ICP measure: it is a monotoni- 
cally non-increasing function of the solution set (specifically, given two solution sets V\ and V2 
such that the latter is a superset of the former, it is easy to see that ICP(V2) < /CP(Pi)). 

5.1. Sampling Weight Values 

Given that the distribution of trade-off value w is known, the straightforward way to find a set 
of representative solutions is to first sample a set of k values for w: {w\, W2, ...,Wk} based on 
the distribution h(w). For each value Wi, we can find an (optimal) plan pi minimizing the value 
of the overall value function V(p, Wi) — Wi x t p + (1 — wi) x c p . The final set of solutions 
V = {pi,p2, ■■■■Pk} is then filtered to remove duplicates and dominated solutions, thus selecting 
the plans making up the lower-convex hull. The final set can then be returned to the user. While 
intuitive and easy to implement, this sampling-based approach has several potential flaws that 
can limit the quality of its resulting plan set. 

First, given that k solution plans are searched sequentially and independently of each other, 
even if the plan p. L found for each lOj is optimal, the final solution set V = {pi,p2-.-Pk} may 
not even be the optimal set of k solutions with regard to the ICP measure. More specifically, 
for a given set of solutions V, some tradeoff value w, and two non-dominated plans p, q such 
that V(p, w) < V(q, w), it is possible that ICP(V U {p}) > ICP(V U {q}). In our running 
example (Figure|5]l, let 7? = {p 2 ,P5} and w = 0.8 then V(pi, w) = 0.8 x 4 + 0.2 x 25 = 8.2 < 
V(p7, w) = 0.8 x 12 + 0.2 x 5 = 10.6. Thus, the planner will select p\ to add to V because 
it looks locally better given the weight w — 0.8. However, ICP({pi,p2,Pz}) ~ 10.05 > 
ICP({p2,P5,P7}) ~ 7.71 so indeed by taking previous set into consideration then p^ is a much 
better choice than p\. 



12 E.g., for logistics_a (prob3 of Table|2j LPG-d solved 128 instances, 41 of them in less than 1 CPU second 
and 97 of them in less than 10 CPU seconds; the average CPU time was 16.7 seconds and the average S av was 0.38. 
While using the standard LPG, only 78 instances were solved, 20 of them in less than 1 CPU seconds and 53 of them in 
less than 10 CPU seconds; the average CPU time was 23.6 seconds and the average S av was 0.27. 
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Algorithm 1: Incrementally find solution set V 



1 


Input: A planning problem with a solution space S; maximum number of plans required 




fc; number of sampled trade-off values fco (0 < fco < k); time bound t; 


2 


Output: A plan set V (\V\ < fc); 


3 


begin 


4 


W <— sample ko values for w; 


5 


V find good quality plans in S for each w € W; 


6 


while \V\ < k and search dime < t do 


7 


Search for p s.t. ICP(V U {p}) < ICP{T) 


8 


P<-PU {p} 


9 


end 


10 


Return 7? 


11 


end 



Second, the values of the trade-off parameter w are sampled based on a given distribution, and 
independently of the particular planning problem being solved. As there is no relation between 
the sampled w values and the solution space of a given planning problem, sampling approach 
may return very few distinct solutions even if we sample a large number of weight values w. In 
our example, if all w samples have values w < 0.67 then the optimal solution returned for any of 
them will always be p?. However, we know that V* = {pi,P3,P7} is the optimal set according 
to the ICP measure. Indeed, if w < 0.769 then the sampling approach can only find the set 
{pi} or {p3, pi} and still not be able to find the optimal set V* . 

5.2. ICP Sequential Approach 

Given the potential drawbacks of the sampling approach outlined above, we also pursued an 
alternative approach that takes into account the ICP measure more actively. Specifically, we 
incrementally build the solution set V by finding a solution p such that V U {p} has the lowest 
ICP value. We can start with an empty solution set V — 0, then at each step try to find a new 
plan p such that V U {p} has the lowest ICP value. 

While this approach directly takes the ICP measure into consideration at each step of finding 
a new plan and avoids the drawbacks of the sampling-based approach, it also has its own share 
of potential flaws. Given that the set is built incrementally, the earlier steps where the first "seed" 
solutions are found are very important. The closer the seed solutions are to the global lower 
convex hull, the better the improvement in the ICP value. In our example (Figure 0, if the first 
plan found is P2 then the subsequent plans found to best extend {^2} can be p$ and thus the final 
set does not come close to the optimal set V* = {^1,^3,^7}. 

5.3. Hybrid Approach 

In this approach, we aim to combine the strengths of both the sampling and ICP-sequential 
approaches. Specifically, we use sampling to find several plans optimizing for different weights. 
The plans are then used to seed the subsequent ICP-sequential runs. By seeding the hybrid 
approach with good quality plan set scattered across the pareto optimal set, we hope to gradually 
expand the initial set to a final set with a much better overall ICP value. AlgorithmQ] shows the 
pseudo-code for the hybrid approach. We first independently sample the set of fco values (with 
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ko pre-determined) of w given the distribution on w (step 4). We then run a heuristic planner 
multiple times to find an optimal (or good quality) solution for each trade-off value w (step 5). 
We then collect the plans found and seed the subsequent runs when we incrementally update the 
initial plan set with plans that lower the overall ICP value (steps 6-8). The algorithm terminates 
and returns the latest plan set (step 9) if k plans are found or the time bound exceeds. 

5.4. Making LPG Search Sensitive to ICP 

Since the LPG planner used in the previous section cannot handle numeric fluents, in particular 
the totalcost represe nting plan cost that w e are interested in, we use a modified version of the 
Metric-LPG planner dGerevini et alll2008h in implementing our algorithms. Not only is Metric- 
LPG equipped with a very flexible local-search framework that has been extended to handle 
various objective functions, but also it can be made to search for single or multiple solutions. 
Specifically, for the sampling-based approach, we first sample the w values based on a given 
distribution. For each w value, we set the metric function in the domain file to: w x makes-pan + 
(1 — w) X totalcost, and run the original LPG in the quality mode to heuristically find the best 
solution within the time limit for that metric function. The final solution set is filtered to remove 
any duplicate solutions, and returned to the user. 

For the ICP-sequential and hybrid approach, we can not use the original LPG implementation 
as is and need to modify the neighborhood evaluation function in LPG to take into account the 
ICP measure and the current plan set V. For the rest of this section, we will explain this procedure 
in detail. 

Background: Metric-LPG uses local search to find plans within the space of numerical action 
graphs (NA-graph). This leveled graph consists of a sequence of interleaved proposition and 
action layers. The proposition layers consist of a set of propositional and numerical nodes, while 
each action layer consists of at most one action node, and a number of no-op links. An NA-graph 
G represents a valid plan if all actions' preconditions are supported by some actions appearing 
in the earlier level in G. The search neighborhood for each local-search step is defined by a set 
of graph modifications to fix some remaining inconsistencies (unsupported preconditions) p at a 
particular level I. This can be done by either inserting a new action a supporting p or removing 
from the graph the action a that p is a precondition of (which can introduce new inconsistencies). 

Each local move creates a new NA-graph G', which is evaluated as a weighted combination 
of two factors: SearchCost(G') and ExecCost(G'). Here, SearchCost(G') is the amount of 
search effort to resolve inconsistencies newly introduced by inserting or removing action a; it is 
measured by the number of actions in a relaxed plan R resolving all such inconsistencies. The 
total cost ExecCost(G'), which is a default function to measure plan quality, is measured by 
the total action execution costs of all actions in R. The two weight adjustment values a and (3 
are used to steer the search toward either finding a solution quickly (higher a value) or better 
solution quality (higher j3 value). Metric-LPG then selects the local move leading to the smallest 
E{G') value. 

Adjusting the evaluation function E(G') for finding set of plans with low ICP measure: 

To guide Metric-LPG towards optimizing our ICP-sensitive objective function instead of the 
original minimizing cost objective function, we need to replace the default plan quality mea- 
sure ExecCost(G') with a new measure ICPEst(G'). Specifically, we adjust the function 
for evaluating each new NA-graph generated by local moves at each step to be a combination 
of SearchCost(G') and ICPEst(G'). Given the set of found plans V = {pi,p2, ...,p n }, 
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ICPEst(G') guides Metric-LPG's search toward a plan p generated from G' such that the re- 
sulting set V U {p} has a minimum ICP value: p = a.rgmmICP(V U {p}). Thus, ICPEst(G') 

p 

estimates the expected total ICP value if the best plan p found by expanding G' is added to the 
current found plan set V . Like the original Metric-LPG, p is estimated by pn = G' [J R where 
R is the relaxed plan resolving inconsistencies in G' caused by inserting or removing a. The 
ICPEst(G') for a given NA-graph G' is calculated as: ICPEst(G') = ICP{V U p R ) with the 
ICP measure as described in Equation[T2] Notice here that while V is the set of valid plans, pr is 
not. It is an invalid plan represented by a NA-graph containing some unsupported preconditions. 
However, Equation[T2]is still applicable as long as we can measure the time and cost dimensions 
of pfl. To measure the makespan of pn, we estimate the time points at which unsupported facts 
in G' would be supported in pn = G' U R and propagate them over actions in G' to its last 
level. We then take the earliest time point at which all facts at the last level appear to measure 
the makespan of pr. For the cost measure, we just sum the individual costs of all actions in pr. 

At each step of Metric-LPG's local search framework, combining two measures ICPEst(G') 
and SearchCost(G') gives us an evaluation function that fits right into the original Metric-LPG 
framework and prefers a NA-graph G' in the neighborhood of G that gives the best trade-off 
between the estimated effort to repair and the estimated decrease in quality of the next resulting 
plan set. 

5.5. Experimental Results 

We have implemented several approaches based on our algorithms discussed in the previous 
sections: Sampling (Section [5. lb , ICP-sequential (Section [5.21 i and Hybrid that combines both 
(Section [5. 3t with both the uniform and triangular ditributions. We consider two types of dis- 
tributions in which the most probable weight for plan makespan are 0.2 and 0.8, which we will 
call "w02" and "w08" distributions respectively (Figure|9]shows these distributions). We test all 
implementations against a set of 20 problems in each of several benchmark temporal planning 
domains used in the previous International Planning Competitions (IPC): ZenoTravel, Driver- 
Log, and Depots. The only modification to the original benchmark set is the added action costs. 
The descriptions of these domains can be found at the IPC website ( ipc.icaps-conference.org). 
The experiments were conducted Intel Core2 Duo machine with 3.16GHz CPU and 4Gb RAM. 
For all approaches, we search for a maximum of k = 10 plans within the 10-minute time limit for 
each problem (i.e., t = 10 minutes), and the resulting plan set is used to compute the ICP value. 
In the Sampling approach, we generate ten trade-off values w between makespan and plan cost 
based on the distribution, and for each one we search for a plan p subject to the value function 
V(p, w) ~ w x t p + (1 — w) x c p . In the Hybrid approach, on the other hand, the first Sampling 
approach is used with fco = 3 generated trade-off values to find an initial plan set, which is then 
improved by the ICP-Sequential runs. As Metric-LPG is a stochastic local search planner, we run 
it three times for each problem and average the results. In 77% and 70% of 60 problems in the 
three tested domains for Hybrid and Sampling approaches respectively, the standard deviation 
of ICP values of plan sets are at most 5% of the average values. This indicates that ICP values 
of plan set in different runs are quite stable. As the Hybrid approach is an improved version of 
ICP-sequential and gives better results in almost all tested problems, we omit the ICP-Sequential 
in discussions below. We now analyze the results in more detailed. 

The utility of using the partial knowledge of user's preferences: To evaluate the utility of tak- 
ing partial preferences into account, we first compare our results against the naive approaches that 
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Figure 9: The distributions: (a) uniform, (b) w()2, (c) w08 (see text). 




Figure 10: Results for the ZenoTravel, DriverLog and Depots domains comparing the Sampling and baseline LPG 
approaches on the overall ICP value (log scale) with the uniform distribution. 



generate a plan set without explicitly taking into account the partial preference model. Specifi- 
cally, we run the default LPG planner with different random seeds to find multiple non-dominated 
plans. The LPG planner was run with both speed setting, which finds plans quickly, and diverse 
setting, which takes longer time to find better set of diverse plans. Figure [10] shows the com- 
parison between quality of plan sets returned by Sampling and those naive approaches when 
the distribution of the trade-off value w between makespan and plan cost is assumed to be uni- 
form. Overall, among 20 tested problems for each of the ZenoTravel, DriverLog, and Depots 
domains, the Sampling approach is better than LPG-speed in 19/20, 20/20 and 20/20 and is bet- 
ter than LPG-d in 18/20, 18/20, and 20/20 problems respectively. We observed similar results 
comparing the Hybrid and those two approaches: in particular, the Hybrid approach is better 
than LPG-speed in all 60 problems and better than LPG-d in 19/20, 18/20, and 20/20 problems 
respectively. These results support our intuition that taking into account the partial knowledge 
about user's preferences (if it is available) increases the quality of plan set. 

Comparing the Sampling and Hybrid approaches: We now compare the effectiveness of the 
Sampling and Hybrid approaches in terms of the quality of returned plan sets with the uniform, 
w02 and w08 distributions. 

ICP value: We first compare the two approaches in terms of the ICP values of plan sets returned 
indicating their quality evaluated by the user. Table|6]|7] and[8]show the results in three domains 
ZenoTravel, DriverLog and Depots. In general, Hybrid tends to be better than Sampling in 
this criterion for most of the domains and distributions. In particular, in ZenoTravel domain it 
returns higher quality plan sets in 15/20 problems when the distribution is uniform, 10/20 and 
13/20 problems when it is w02 and w08 respectively (both approaches return plan sets with 
equal ICP values for two problems with the w02 and one problem with the w08 distribution). 
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Prob 


Sampling 


Hybrid 


Prob 


Sampling 


Hybrid 


Prob 


Sampling 


Hybrid 


1* 


840.00 


839.98 


1 


972.00 


972.00 


1 


708.00 


708.00 


2* 


2,661.43 


2,661.25 


2 


3,067.20 


3,067.20 


2* 


2,255.792 


2,255.788 


3* 


1,807.84 


1,805.95 


3* 


2,083.91 


2,083.83 


3* 


1,535.54 


1,535.32 


4* 


3,481.31 


3,477.49 


4* 


4,052.75 


4,026.92 


4* 


2,960.84 


2,947.66 


5* 


3,007.97 


2,743.85 


5* 


3,171.86 


3,171.73 


5* 


2,782.16 


2,326.94 


6* 


3,447.37 


2,755.25 


6* 


4,288.00 


3,188.61 


6* 


2,802.00 


2,524.18 


7* 


4,006.38 


3,793.44 


7* 


4,644.40 


4,377.40 




3,546.95 


3,235.63 


8* 


4,549.90 


4,344.70 


8* 


5,060.81 


5,044.43 


8* 


3,802.60 


3,733.90 


9* 


6,397.32 


5,875.13 


9* 


7,037.87 


6,614.30 


9* 


5,469.24 


5,040.88 


10* 


7,592.72 


6,826.60 


10* 


9,064.40 


7,472.37 


10* 


6,142.68 


5,997.45 


11* 


5,307.04 


5,050.07 


11* 


5,946.68 


5,891.76 


11* 


4,578.09 


4,408.36 


12* 


7,288.54 


6,807.28 


12* 


7,954.74 


7,586.28 


12 


5,483.19 


5,756.89 


13* 


10,208.11 


9,956.94 


13* 


11,847.13 


11,414.88 


13* 


8,515.74 


8,479.09 


14 


11,939.22 


13,730.87 


14 


14,474.00 


15,739.19 


14* 


11,610.38 


11,369.46 


15 


9,334.68 


13,541.28 


15 


16,125.70 


16,147.28 


15* 


11,748.45 


11,418.59 


16* 


16,724.21 


13,949.26 


16 


19,386.00 


19,841.67 


16 


14,503.79 


15,121.77 


17* 


27,085.57 


26,822.37 


17 


29,559.03 


32,175.66 


17 


21,354.78 


22,297.65 


18 


23,610.71 


25,089.40 


18 


28,520.17 


29,020.15 


18 


20,107.03 


21,727.75 


19 


29,114.30 


29,276.09 


19 


34,224.02 


36,496.40 


19 


23,721.90 


25,222.24 


20 


34,939.27 


37,166.29 


20 


39,443.66 


42,790.97 


20 


28,178.45 


28,961.51 



(a) " (b) ~ (c) 



Table 6: The ICP value of plan sets in ZenoTravel domain returned by the Sampling and Hybrid approaches with the 
distributions (a) uniform, (b) w02 and (c) w08. The problems where Hybrid returns plan sets with better quality than 
Sampling are marked with (*). 



In the DriverLog domain, Hybrid returns better plan sets for 1 1/20 problems with the uniform 
distribution (and for other three problems the plan sets have equal ICP values), but worse with 
the triangular distributions: 8/20 (another 2 equals) and 9/20 (another one equals) with w02 and 
w08. The improvement on the quality of plan sets that Hybrid contributes is more significant in 
the Depots domain: it is better than Sampling in 1 1/20 problems with the uniform distribution 
(and equal in 3 problems), in 12/20 problems with the w02 and w08 distributions (with w02 both 
approaches return plan sets with equal ICP values for 4 problems, and for 2 problems when it is 
w08). 

In many large problems of the ZenoTravel and DriverLog domains where Sampling performs 
better than Hybrid, we notice that the first phase of the Hybrid approach that searches for the first 
3 initial plans normally takes most of the allocated time, and therefore there is not much time 
left for the second phase to improve the quality of plan set. We also observe that among the 
three settings of the trade-off distributions, the positive effect of the second phase in Hybrid 
approach (which is to improve the quality of the initial plan sets) tends to be more stable across 
different domains with uniform distribution, but less with the triangular, in particular Sampling 
wins Hybrid in DriverLog domains when the distribution is w02. Perhaps this is because with 
the triangular distributions, the chance that LPG planner (that is used in our Sampling approach) 
returns the same plans even with different trade-off values would increase, especially when the 
most probable value of makespan happens to be in a (wide) range of weights in which one 
single plan is optimal. This result agrees with the intuition that when the knowledge about user's 
preferences is almost complete (i.e. the distribution of trade-off value is "peak"), then Sampling 
approach with smaller number of generated weight values may be good enough (assuming that a 
good planner optimizing a complete value function is available). 
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Prob 


Sampling 


Hybrid 


Prob 


Sampling 


Hybrid 


Prob 


Sampling 


Hybrid 


1 


212.00 


212.00 


1 


235.99 


236.00 


1 


188.00 


188.00 


2* 


363.30 


348.38 


2* 


450.07 


398.46 


2* 


333.20 


299.70 


3 


176.00 


176.00 


3 


203.20 


203.20 


3 


148.80 


148.80 


4* 


282.00 


278.45 


4* 


336.01 


323.79 


4* 


238.20 


233.20 


5* 


236.83 


236.33 


5 


273.80 


288.51 


5* 


200.80 


199.52 


6* 


222.00 


221.00 


6 


254.80 


254.80 


6* 


187.47 


187.20 


7 


176.50 


176.50 


7* 


226.20 


203.80 


7 


149.20 


149.20 


8* 


338.96 


319.43 


8 


387.53 


397.75 


8 


300.54 


323.87 


9* 


369.18 


301.72 


9* 


420.64 


339.05 


9* 


316.80 


263.92 


10* 


178.38 


170.55 


10* 


196.44 


195.11 


10* 


158.18 


146.12 


11* 


289.04 


232.65 


11* 


334.13 


253.09 


11* 


245.38 


211.60 


12 


711.48 


727.65 


12* 


824.17 


809.93 


12* 


605.86 


588.82 


13* 


469.50 


460.99 


13 


519.92 


521.05 


13 


388.80 


397.67 


14 


457.04 


512.11 


14 


524.56 


565.94 


14 


409.02 


410.53 


15* 


606.81 


591.41 


15* 


699.49 


643.72 


15 


552.79 


574.95 


16 


4,432.21 


4,490.17 


16 


4,902.34 


6,328.07 


16 


3,580.32 


4,297.47 


17 


1,310.83 


1,427.70 


17 


1,632.86 


1,659.46 


17 


1,062.03 


1,146.68 


18* 


1,800.49 


1,768.17 


18 


1,992.32 


2,183.13 


18 


1,448.36 


1,549.09 


19 


3,941.08 


4,278.67 


19 


4,614.13 


7,978.00 


19* 


3,865.54 


2,712.08 


20 


2,225.66 


2,397.61 


20 


2,664.00 


2,792.90 


20 


1,892.28 


1,934.11 



(a) " ' (b) (c) 



Table 7: The ICP value of plan sets in DriverLog domain returned by the Sampling and Hybrid approaches with the 
distributions (a) uniform, (b) w02 and (c) w08. The problems where Hybrid returns plan sets with better quality than 
Sampling are marked with (*). 



Prob 


Sampling 


Hybrid 


Prob 


Sampling 


Hybrid 


Prob 


Sampling 


Hybrid 


1 


27.87 


27.87 


1 


28.56 


28.56 


1* 


28.50 


27.85 


2 


39.22 


39.22 


2 


41.12 


41.12 


2 


38.26 


38.26 


3* 


51.36 


50.43 


3* 


54.44 


52.82 


3* 


49.49 


48.58 


4 


43.00 


43.00 


4 


46.00 


46.00 


4* 


40.87 


40.00 


5 


80.36 


81.01 


5 


82.93 


84.45 


5 


75.96 


78.99 


6 


99.40 


111.11 


6 


102.58 


110.98 


6 


94.79 


98.40 


7* 


38.50 


38.49 


7* 


40.53 


40.40 


7# 


37.04 


36.60 


8* 


59.08 


58.41 


8* 


62.15 


62.08 


8* 


55.89 


54.67 


9 


95.29 


103.85 


9 


100.59 


106.00 


9 


87.93 


95.05 


10* 


52.04 


50.00 


10 


52.40 


52.40 


10* 


47.86 


47.60 


11 


101.43 


107.66 


11* 


110.18 


108.07 


11 


97.56 


99.06 


12 


123.09 


129.34 


12* 


144.67 


135.80 


12 


124.58 


128.01 


13* 


57.37 


57.22 


13* 


60.83 


60.72 


13 


54.66 


54.66 


14* 


62.75 


62.33 


14* 


70.32 


69.87 


14* 


65.20 


62.02 


15 


116.82 


117.86 


15 


113.15 


124.28 


15 


101.09 


124.43 


16* 


50.77 


49.36 


16* 


54.98 


54.12 


16* 


47.04 


46.35 


17* 


38.38 


37.77 


17* 


42.86 


41.50 


17* 


37.56 


36.92 


18* 


88.28 


85.55 


18* 


94.53 


90.02 


18* 


76.73 


75.29 


19* 


82.60 


82.08 


19* 


94.21 


89.28 


19* 


74.73 


72.45 


20* 


137.13 


133.47 


20* 


150.80 


135.93 


20* 


122.43 


120.31 



(a) ~" " (b) ~ ~ (c) 



Table 8: The ICP value of plan sets in Depots domain returned by the Sampling and Hybrid approaches with the distribu- 
tions (a) uniform, (b) w02 and (c) w08. The problems where Hybrid returns plan sets with better quality than Sampling 
are marked with (*). 
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Median of makespan 


Median of cost 


Domain 


Distribution 


S > H 


H > S 


S > H 


H > S 




uniform 


3 


17 


16 


4 


ZenoTravel 


w02 


6 


12 


14 


4 




w08 


6 


13 


13 


6 




uniform 


6 


11 


7 


11 


DriverLog 


w02 


10 


8 


8 


10 




w08 


10 


7 


9 


9 




uniform 


9 


8 


9 


7 


Depots 


w02 


7 


9 


5 


9 




w08 


11 


7 


7 


11 



Table 9: The numbers of problems for each domain, distribution and feature where Sampling (Hybrid) returns plan sets 
with better (i.e. smaller) median of feature value than that of Hybrid (Sampling), denoted in the table by S > H (H > S, 
respectively). We mark bold the numbers of problems that indicate the outperformance of the corresponding approach. 



Since the quality of a plan set depends on how the two features makespan and plan cost are 
optimized, and how the plans "span" the space of time and cost, we also compare Sampling and 
Hybrid approaches in terms of those two criteria. In particular, we compare plan sets returned 
by the two approaches in terms of (i) their median values of makespan and cost, which represent 
how "close" the plan sets are to the origin of the space of makespan and cost, and (ii) their 
standard deviation of makespan and cost values, which indicate how the sets span each feature 
axis. 

Table [9] summarizes for each domain, distribution and feature the number of problems in 
which each approach (either Sampling or Hybrid) generates plan sets with better median of each 
feature value (makespan and plan cost) than the other. There are 60 problems across 3 different 
distributions, so in total, 180 cases for each feature. Sampling and Hybrid return plan sets with 
better makespan in 40 and 62 cases, and with better plan cost in 52 and 5 1 cases (respectively), 
which indicates that Hybrid is slightly better than Sampling on optimizing makespan but is pos- 
sibly worse on optimizing plan cost. In ZenoTravel domain, for all distributions Hybrid likely 
returns better plan sets on the makespan than Sampling, and Sampling is better on the plan cost 
feature. In DriverLog domain, Sampling is better on the makespan feature with both non-uniform 
distributions, but worse than Hybrid with the uniform. On the plan cost feature, Hybrid returns 
plan sets with better median than Sampling on the uniform and w02 distribution, and both ap- 
proaches perform equally well with the w08 ditribution. In Depots domain, Sampling is better 
than Hybrid on both features with the uniform distribution, and only better than Hybrid on the 
makespan with the distribution w08. 

In terms of spanning plan sets, Hybrid performs much better than Sampling on both features 
across three domains, as shown in Table |T0j In particular, over 360 cases for both makespan 
and plan cost features, there are only 10 cases where Sampling produces plan sets with better 
standard deviation than Hybrid on each feature. Hybrid, on the other hand, generates plan sets 
with better standard deviation on makespan in 91 cases, and in 85 cases on the plan cost. 

These experimental results support our arguments in Section lBTTI about the limits of sampling 
idea. Since one single plan could be optimal for a wide range of weight values, the search in 
Sampling approach with different trade-off values may focus on looking for plans only at the 
same region of the feature space (specified by the particular value of the weight), which can 
reduce the chance of having plans with better value on some particular feature. On the opposite 
side, the Hybrid approach tends to be better in spanning plan sets to a larger region of the space, 
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SD of makespan 


SD of cost 


Domain 


Distribution 


S > H 


H > S 


S > H 


H > S 




uniform 


8 


12 


6 


14 


ZenoTravel 


w02 


4 


14 


7 


11 




w08 


6 


13 


8 


11 




uniform 


5 


11 


6 


10 


DriverLog 


w02 


7 


10 


7 


9 




w08 


8 


9 


10 


7 




uniform 


10 


7 


7 


9 


Depots 


w02 


7 


9 


5 


10 




w08 


5 


13 


7 


11 



Table 10; The numbers of problems for each domain, distribution and feature where Sampling (Hybrid) returns plan 
sets with better (i.e. larger) standard deviation of feature value than that of Hybrid (Sampling), denoted in the table 
by S > H (H > S, respectively). We mark bold the numbers of problems that indicate the outperformance of the 
corresponding approach. 



as the set of plans that have been found is taken into account during the search. 

Contribution to the lower convex hull: The comparison above between Sampling and Hybrid 
considers the two features separately. We now examine the relation between plan sets returned 
by those approaches on the joint space of both features, in particular taking into account the 
the dominance relation between plans in the two sets. In other words, we compare the relative 
total number of plans in the lower convex-hull (LCH) found by each approach. Given that this 
is the set that should be returned to the user (to select one from), the higher number tends to 
give her a better expected utility value. To measure the relative performance of both approaches 
with respect to this criterion, we first create a set S combining the plans returned by them. We 
then compute the set Si c h Q S of plans in the lower convex hull among all plans in S. Finally, 
we measure the percentages of plans in Si cn . that are actually returned by each of our tested 
approaches. Figures [TT1 [T2l and [131 show the contribution to the LCH of plan sets returned by 
Sampling and Hybrid in ZenoTravel, DriverLog and Depots domains. 

In general, we observe that the plan set returned by Hybrid contributes more into the LCH 
than that of Sampling for most of the problems (except for some large problems) with most of the 
distributions and domains. Specifically, in ZenoTravel domain, Hybrid contributes more plans to 
the LCH than Sampling in 15/20, 13/20 (and another 2 equals), 13/20 (another 2 equals) problems 
for the uniform, w02 and w08 distributions respectively. In DriverLog domain, it is better than 
Sampling in 10/20 (another 6 equals), 10/20 (another 4 equals), 8/20 (another5 equals) problems; 
and Hybrid is better in 1 1/20 (another 6 equals), 1 1/20 (another 4 equals) and 1 1/20 (another 4 
equals) for the uniform, w02 and w08 distributions in Depots domain. Again, similar to the ICP 
value, the Hybrid approach is less effective on problems with large size (except with the w08 
distribution in Depots domain) in which the searching time is mostly used for finding initial plan 
sets. We also note that a plan set with higher contribution to the LCH is not guaranteed to have 
better quality, except for the extreme case where one plan set contributes 100% and completely 
dominates the other which contributes 0% to the LCH. For example, consider the problem 14 in 
ZenoTravel domain: even though the plan sets returned by Hybrid contribute more than those of 
Sampling in all three distributions, it is only the w08 where it has a better ICP value. The reason 
for this is that the ICP value depends also on the range of the trade-off value (and its density) for 
which a plan in the LCH is optimal, whereas the LCH is constructed by simply comparing plans 
in terms of their makespan and cost separately (i.e. using the dominance relation), ignoring their 
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Uniform distribution Triangular distribution (0.2) Triangular distribution (0.8} 




Problem Problem Problem 

Figure 1 1 : The contribution into the common lower convex hull of plan sets in ZenoTravel domain with different distri- 
butions. 



Uniform distribution Triangular distribution (0.2) Triangular distribution (0.8) 




Problem Problem Problem 



Figure 12: The contribution into the common lower convex hull of plan sets in DriverLog domain with different distri- 
butions. 



relative importance. 

The sensitivity of plan sets to the distributions: All analysis having been done so far is to 
compare the effectiveness of approaches with respect to a particular distribution of the trade- 
off value. In this part, we examine how sensitive the plan sets are with respect to different 
distributions. 

Optimizing high-priority feature: We first consider how plan sets are optimized on each feature 
(makespan and plan cost) by each approach with respect to two non-uniform distributions w02 
and w08. Those are the distributions representing scenarios where the users have different pri- 
ority on the features, and plan sets should be biased to optimizing the feature that has higher 
priority (i.e. larger value of weight). In particular, plans generated using the w08 distribution 
should have better (i.e. smaller) makespan values than those found with the w02 distribution 
(since in the makespan has higher priority in w08 than it is in w02); on the other hand, plan set 
returned with w02 should have better values of plan cost than those with w08. 

Table QT| summarizes for each domain, approach and feature, the number of problems in 
which plan sets returned with one distribution (either w02 or w08) have better median value 
than with the other. We observe that for both features, the Sampling approach is very likely to 
"push" plan sets to regions of the space of makespan and cost with better value of more interested 
feature. On the other hand, the Hybrid approach tends to be more sensitive to the distributions 
on both the features in ZenoTravel domain, and is more sensitive only on the makespan feature 
in DriverLog and Depots domain. Those results generally show that our approaches can bias the 
search towards optimizing features that are more desired by the user. 
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Figure 13: The contribution into the common lower convex hull of plan sets in Depots domain with different distributions. 





Median of makespan 


Median of cost 


Approach 


Domain 


w02 > w08 


u;08 > w02 


w02 > w08 


u;08 > w02 




ZenoTravel 


5 


13 


11 


8 


Sampling 


DriverLog 


6 


10 


13 


5 




Depots 


6 


12 


10 


7 




ZenoTravel 


5 


10 


10 


4 


Hybrid 


DriverLog 


4 


10 


6 


9 




Depots 


8 


10 


4 


11 



Table 1 1 : The number of problems for each approach, domain and feature where plan sets returned with the w()2 (w08) 
distribution with better (i.e. smaller) median of feature value than that with w08 (w02), denoted in the table by w02 > 
w08 (w08 > w02, respectively). For each approach, we mark bold the numbers for domains in which there are more 
problems whose plan sets returned with w08 (w02) have better makespan (plan cost) median than those with w02 (w08, 
respectively). 



Spanning plan sets on individual features: Next, we examine how plan sets span each feature, 
depending on the degree of incompleteness of the distributions. Specifically, we compare the 
standard deviation of plan sets returned using the uniform distribution with those generated using 
the distributions w02 and w08. Intuitively, we expect that plan sets returned with the uniform 
distribution would have higher standard deviation than those with the distributions w02 and w08. 

Table [12] shows for each approach, domain and feature, the number of problems generated 
with the uniform distribution that have better standard deviation on the feature than those found 
with the distribution w02. We observe that with the makespan feature, both approaches return 
plan sets that are more "spanned" on makespan in the Depots domain, but not with ZenoTravel 
and DriverLog. With the plan cost feature, Hybrid shows its positive impact on all three domains, 
whereas Sampling shows it with the ZenoTravel and Depots domain. Similarly, table Qj] shows 
the results comparing the uniform and w08 distributions. This time, Sampling returns plan sets 
with better standard deviation on both features in the ZenoTravel and Depots domains, but not in 
DriverLog. Hybrid also shows this in ZenoTravel domain, but for the remaining two domains, it 
tends to return plan sets with expected standard deviation on the plan cost feature only. From all 
of these results, we observe that with the uniform distribution, both approaches likely generate 
plan sets that span better than with non-uniform distributions, especially on the plan cost feature. 

In summary, the experimental results in this section support the following hypotheses: 

• Instead of ignoring user's preference models which are partially specified, one should take 
them into account during plan generation, as plan sets returned would have better quality. 
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SD of makespan 


SD of cost 


Approach 


Domain 


U > w02 


w02 > U 


U > w02 


w02 > U 




ZenoTravel 


9 


10 


10 


1 


Sampling 


DriverLog 


6 


8 


7 


8 




Depots 


9 


6 


8 


7 




ZenoTravel 


9 


10 


12 


7 


Hybrid 


DriverLog 


6 


9 


8 


7 




Depots 


8 


6 


9 


4 



Table 12: The numbers of problems for each approach, domain and feature where plan sets returned with the uniform 
(w02) distribution have better (i.e. higher) standard deviation of the feature value than that with w02 (uniform), denoted 
in the table by U > w02 (w02 > U, respectively). For each approach and feature, we mark bold the numbers for 
domains in which there are more problems whose plan sets returned with the uniform distribution have better standard 
deviation value of the feature than those with the w02 distribution. 





SD of makespan 


SD of cost 


Approach 


Domain 


U > w08 


w08 > U 


U > w08 


u>08 > U 




ZenoTravel 


11 


8 


15 


4 


Sampling 


DriverLog 


5 


10 


5 


9 




Depots 


12 


7 


12 


6 




ZenoTravel 


10 


9 


15 


4 


Hybrid 


DriverLog 


7 


7 


8 


6 




Depots 


5 


8 


11 


4 



Table 13: The numbers of problems for each approach, domain and feature where plan sets returned with the uniform 
(w08) distribution with better (i.e. higher) standard deviation of feature value than that with w()8 (uniform), denoted in 
the table by U > wQ8 {w08 > U, respectively). For each approach and feature, we mark bold the numbers for domains 
in which there are more problems whose plan sets returned with the uniform distribution have better standard deviation 
value of the feature than those with the w08 distribution. 
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• In generating plan sets sequentially to cope with partial preference models, Sampling ap- 
proach that searches for plans separately and independently of the solution space tends to 
return worse quality plan sets than Hybrid approach. 

• The resulting plan sets returned by Hybrid approach tend to be more sensitive to the user's 
preference models than those found by Sampling approach. 



6. Related Work 



Currently there are very few research efforts in the planning literature that explicitly consider in- 
completely specified user preferences during planning. The usual approach for ha ndling multiple 

objectives is to assume that a specifi c way of combining the objectives is available ( Refanidis and Vlahavasl 



2003HDo and Kambhampati[ [2003). and search for one optimal plan with respect to this function 



Brafman & Chernyavsky (2005) discuss a CSP-based approach to find a plan for the most pref- 
ered goal state given the qualitative preferences on goals. There is no actio n cost and makespa n 
measurements such as in our problem setting. Other relevant work includes Brvce et al.l (12007b . 
in which the authors devise a variant of LAO* algorithm to search for a conditional plan with 
multiple execution options for each observation branch that are non-dominated with respect to 
objectives like probability and cost to reach the goal. 

In the context of decision-theoretic planning, some work has been focused on scenarios where 
the value function is not completely defined, in particular due to the incompleteness in specify- 
ing the reward function. In those cases , one approach is to search for the most robust policy 
with different robustness criteria (e.g., Pelage and Mannorl 120071 : iRegan and Boutilien, l2010t 
Nilim and Ghaouil 120051) . The idea of searching fo r sets of policies has also been considered 
recently in reinforcement learning. Specifically, in dNataraian an d Tadepalli, 2005) the reward 
function is incomplete with weight values changing over time, and a set of policies is searched 
and stored so that whenever the w eights change a new best pol icy can be found by improving 
those in the set. On the other hand. lBarrett and Narayanan (2008) provide Bellman equations for 
the Q-values using all vectors on convex hull to search for the whole pareto set. 

Our work on planning with partial user's preferences is also related to work on preference 
elicita tion and decision making under uncertainty of preferences. For instance, Chajewska et al 
(2000) consider a decision making scenario where the utility function is assumed to be drawn 
from a known distribution, and either a single best strategy or an elicitation question will be sug- 
gested based on the expected utility of the strategy and the value of information of the question. 
Boutilier et al. ( 12010 ) considers preference elicitation problem in which the user's preference 
model is incomplete on both the set of features and the utility function. However, these scenarios 
are different from ours in two important issues: we focus on efficient approach to synthesizing 
plans with respect to the partial preferences, whereas the "outcomes" or "configurations" in their 
cases are considered given upfront (or could be obtained with low cost), and we aim to search 
for a set of plans based on a quality measure of plan sets (instead of a quality measure over 
individual outcome or configuration). 

Our approach to generating diverse plan sets to cope with plann ing scenarios without knowl- 
edge of user's prefere nces is in the same spirit as (ITate et all LL998) and Myers dMversl [2006; 
Mvers and LeeL 1 1999b . though for different purposes. Myers, in particular, presents an ap- 
proach to generate diverse plans in the context of her HTN planner by requiring the meta- 
theory of the domain to be a vailable and using bias on the meta-theoretic elements to control 
search (Mvers and Lee, 1999). The metatheory of the domain is defined in terms of pre-defined 
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attributes and their possible values covering roles, features and measures. Our work differs from 
hers in two respects. First, we focus on domain-independent distance measures. Second we 
consider the computation of diverse plans in the context of state of the art domain independent 
planners. 

The problem of finding multiple but si milar plans has be en considered in the context of re- 



planning. A recent effort in this direction is (IFox et all 2006). Our work focuses on the problem 



of finding diverse plans by a variety of distance measures when the user's preferences exist but 
are completely unknown. 



Outside the planning literature, our closest connection is to the work by Hebrard et al. 2005, 
who solve the problem of finding similar/dissimilar solutions for CSPs without additional domain 
knowledge. It is instructive to note that unlike CSP, where the number of potential solutions is 
finite (albeit exponential), the number of distinct plans for a given problem can be infinite (since 
we can have infinitely many non-minimal versions of the same plan). Thus, effective approaches 
for generating diverse plans are even more critical. The challenges in finding interrelated plans 
also bear some tangential similarities to the w ork in information retrieval on finding similar or 



dissimilar documents (c.f. (Zhang et al., 2002)). 



7. Conclusion and Future Work 

In this paper, we consider the planning problem with partial user's preference model in two sce- 
narios where the knowledge about preference is completely unknown or only part of it is given. 
We propose a general approach to this problem where a set of plans is presented to the user 
from which she can select. For each situation of the incompleteness, we define different quality 
measure of plan sets and investigate approaches to generating plan set with respect to the quality 
measure. In the first scenario when the user is known to have preferences over plans, but the 
details are completely unknown, we define the quality of plan sets as their diversity value, spec- 
ified with syntactic features of plans (its action set, sequence of states, and set of causal links). 
We then consider generating diverse set of plans using two state-of-the-art planners, GP-CSP and 
LPG. The approaches we developed for supporting the generation of diverse plans in GP-CSP are 
broadly applicable to other planners based on bounded horizon compilation approaches for plan- 
ning. Similarly, the techniques we developed for LPG, such as biasing the relaxed plan heuristics 
in terms of distance measures, could be applied to other heuristic planners. The experimental 
results with GP-CSP explicate the relative difficulty of enforcing the various distance measures, 
as well as the correlation among the individual distance measures (as assessed in terms of the 
sets of plans they find). The experiments with LPG demonstrate the potential of planning using 
heuristic local search in producing large sets of highly diverse plans. 

When part of the user's preferences is given, in particular the set of features that the user is 
interested in and the distribution of weights representing their relative importance, we propose 
the usage of Integrated Preference Function, and its special case Integrated Convex Preference 
function, to measure the quali ty of plan se t s, and propose various heuristic approaches based 
on the Metric-LPG planner ( Gerevini et all 120081) to find a good plan set with respect to this 



measure. We show empirically that taking partial preferences into account does improve the 
quality of plan set returned to the users, and that our proposed approaches are sensitive to the 
degree of preference incompleteness, represented by the distribution. 

While a planning agent may well start with a partial preference model, in the long run, we 
would like the agent to be able to improve the preference model through repeated interactions 
with the user. In our context, at the beginning when the degree of incompleteness of preferences 
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is high, the learning will involve improving the estimate of h(a) based on the feedback about 
the specific plan that the user selects from the set returned by the system. This learning phase 
is in principle well connected to the Bayesian parameter estimation approach in the sense that 
the whole distribution of parameter vector, h(a), is updated after receiving feedback from the 
user, taking into account the current distribution of all models (starting from a prior, for instance 
the uniform dist ribution). Although su ch interactive learning framework has been discussed 



previously, as in IChaiewska et al" (2001), the set of user's decisions in this work is assumed to 



be given, whereas in planning scenarios the cost of plan synthesis should be incoporated into the 
our interactive framework, and the problem of presenting plan sets to the user needs also to be 
considered. Recent work by Li et al. d2009h considered learning user's preferences in planning, 



but restricting to preference models that can be represented with hierachical task networks. 
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